fix: Major refactoring of Polling, Retry and Timeout logic#462
Merged
parthea merged 12 commits intogoogleapis:mainfrom Nov 10, 2022
Merged
fix: Major refactoring of Polling, Retry and Timeout logic#462parthea merged 12 commits intogoogleapis:mainfrom
parthea merged 12 commits intogoogleapis:mainfrom
Conversation
This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress.
This was referenced Oct 20, 2022
vam-google
added a commit
to vam-google/gapic-generator-python
that referenced
this pull request
Oct 27, 2022
Also fix LRO for REST transport. This PR makes generated gapics appeciate timeout values from grpc_service_config.json instead of overriding them with None (which means no timeout) It is basically a direct fix for googleapis#1477. This PR depends on googleapis/python-api-core#462, and expects `setup.py.j2` templates to be updated after googleapis/python-api-core#462 gets pushed and released with new version.
atulep
reviewed
Oct 28, 2022
atulep
reviewed
Oct 31, 2022
atulep
reviewed
Oct 31, 2022
atulep
reviewed
Oct 31, 2022
atulep
reviewed
Oct 31, 2022
atulep
reviewed
Oct 31, 2022
atulep
reviewed
Oct 31, 2022
Contributor
Author
|
@atulep Addressed your comments PTAL |
atulep
reviewed
Nov 1, 2022
atulep
reviewed
Nov 1, 2022
atulep
reviewed
Nov 1, 2022
atulep
reviewed
Nov 1, 2022
parthea
approved these changes
Nov 10, 2022
parthea
pushed a commit
that referenced
this pull request
Dec 1, 2022
* fix: Major refactoring and fix for Polling, Retry and Timeout logic This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress. * fix ci failures (mainly sphinx errors) * remove unused code * fix typo * Pin pytest version to <7.2.0 * reformat code * address pr feedback * address PR feedback * address pr feedback * Update google/api_core/future/polling.py Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Apply documentation suggestions from code review Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Address PR feedback Co-authored-by: Victor Chudnovsky <vchudnov@google.com>
parthea
added a commit
that referenced
this pull request
Dec 1, 2022
…ch (#474) * fix: Major refactoring of Polling, Retry and Timeout logic (#462) * fix: Major refactoring and fix for Polling, Retry and Timeout logic This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress. * fix ci failures (mainly sphinx errors) * remove unused code * fix typo * Pin pytest version to <7.2.0 * reformat code * address pr feedback * address PR feedback * address pr feedback * Update google/api_core/future/polling.py Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Apply documentation suggestions from code review Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Address PR feedback Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * feat: Allow representing enums with their unqualified symbolic names in headers (#465) * feat: Allow non-fully-qualified enums in routing headers * Rename s/fully_qualified_enums/qualified_enums/g for correctness * chore: minor tweaks * chore: Temporary workaround for pytest in noxfile. * Fix import order * bring coverage to 100% * lint * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * remove replacement in owlbot.py causing lint failure Co-authored-by: Anthonios Partheniou <partheniou@google.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore(python): update release script dependencies (#472) * chore(python): drop flake8-import-order in samples noxfile Source-Link: googleapis/synthtool@6ed3a83 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:3abfa0f1886adaf0b83f07cb117b24a639ea1cb9cffe56d43280b977033563eb * drop flake8-import-order * lint * use python 3.9 for docs * resolve mypy error * update python version for lint * fix lint * fix lint Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com> Co-authored-by: Vadym Matsishevskyi <25311427+vam-google@users.noreply.github.com> Co-authored-by: Victor Chudnovsky <vchudnov@google.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: gcf-owl-bot[bot] <78513119+gcf-owl-bot[bot]@users.noreply.github.com>
parthea
added a commit
to googleapis/gapic-generator-python
that referenced
this pull request
Dec 5, 2022
* fix: Fix timeout default values Also fix LRO for REST transport. This PR makes generated gapics appeciate timeout values from grpc_service_config.json instead of overriding them with None (which means no timeout) It is basically a direct fix for #1477. This PR depends on googleapis/python-api-core#462, and expects `setup.py.j2` templates to be updated after googleapis/python-api-core#462 gets pushed and released with new version. * rename uri_prefix to path_prefix to match corresponding python-api-core change * fix unnecessary `gapic_v1.method.DEFAULT` in rest stubs * fix(deps): require google-api-core >=1.34.0 * fix(deps): require google-api-core >=2.11.0 * revert changes to WORKSPACE * fix typo * fix mypy error * revert local change for debugging Co-authored-by: Anthonios Partheniou <partheniou@google.com>
tswast
reviewed
Dec 14, 2022
| polling = polling.with_timeout(timeout) | ||
|
|
||
| try: | ||
| kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry} |
Member
There was a problem hiding this comment.
Also for python-aiplatform: googleapis/python-aiplatform#1870
Contributor
Author
There was a problem hiding this comment.
that rety logic line never worked (the retry had been i. What broke you is most likely the new default timeout value (instead of None).
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 7, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 9, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 9, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 9, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 9, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
copybara-service bot
pushed a commit
to kubeflow/pipelines
that referenced
this pull request
Feb 9, 2023
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 508463716
|
It seems like this merge broke using freezegun, particularly with google-cloud-datastore in unit testing. Here's the stack trace: Would you expect that to happen? Any workarounds? |
4 tasks
This was referenced May 30, 2025
parthea
added a commit
to googleapis/google-cloud-python
that referenced
this pull request
Nov 24, 2025
* fix: Fix timeout default values Also fix LRO for REST transport. This PR makes generated gapics appeciate timeout values from grpc_service_config.json instead of overriding them with None (which means no timeout) It is basically a direct fix for googleapis/gapic-generator-python#1477. This PR depends on googleapis/python-api-core#462, and expects `setup.py.j2` templates to be updated after googleapis/python-api-core#462 gets pushed and released with new version. * rename uri_prefix to path_prefix to match corresponding python-api-core change * fix unnecessary `gapic_v1.method.DEFAULT` in rest stubs * fix(deps): require google-api-core >=1.34.0 * fix(deps): require google-api-core >=2.11.0 * revert changes to WORKSPACE * fix typo * fix mypy error * revert local change for debugging Co-authored-by: Anthonios Partheniou <partheniou@google.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If you are a user of python-api-core and experience polling timing out after ~15 minutes (900s) in your python code after this change, please make sure that instead of calling
PollingFuture.result()you call it with additonaltimeoutargument like this:PollingFuture.result(timeout = <desired timeout in seconds>)orPollingFuture.result(timeout = None)(for infinite timeout, but infinite timeouts are strongly discouraged) for your code to stop timing out at 900s.The core libraries cannot and should not have infitine polling as as default behavior. The fact that it had it like that in python for years (unlike other GCP-supported languages) was a bug, as it contradicts original cross-language LRO methods design in GAPIC libraries and in general users are not supposed to be put into an infinite-loop scenario implicitly. If an LRO can actually ran for hours or even days and it is Ok, then users are expected to acknowledge this fact by providing the huge timeouts explicitly.
This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementation.
To properly describe the refactoring in this PR we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overall confusion among both groups: users of the library and creators of the library. Please check the updated (in this PR) documentation of the
google.api_core.retry.Retryclass and thegoogle.api_core.future.polling.Polling.result()method for the proper definitions and context.Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was as clean as I could make it while still maintaining backward compatibility of the whole library.
The quick summary of the changes in this PR:
Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the
google.api_core.retry.Retryclass for the actual definitions. Originally thedeadlinehas been used to represent timeouts conflating the two concepts. As result this PR replacesdeadlinearguments withtimeoutones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications).Properly define RPC Timeout, Retry Timeout and Polling Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check
google.api_core.retry.Retryclass documentation for details.Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for
google.api_core.future.polling.PollingFuture.result()for details.Separate
retryandpollingconfigurations for Polling future, as these are two different concepts (although both operating onRetryclass). Originally both retry and polling configurations were controlled by a singleretryparameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled.For the following config properties -
Retry(includingRetry Timeout),Polling(includingPolling Timeout) andRPC Timeout- fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (checkPollingFuture.result()method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values ingrpc_service_config.jsonfile (for Retry and RPC Timeout) andgapic.yamlfile (for Polling Timeout), or be provided as a hard-coded basic default values in python-api-core library itself. This alo includes fixing the per-call polling config propagation logic (the polling/retry configs supplied toPollingFuture.result()used to be ignored for actual call).Deprecate
ExponentialTimeout,ConstantTimeoutand related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it withTimeToDeadlineTimeoutto be consistent with how the rest of the languages do it.Deprecate
google.api_core.operations_v1.configas it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead.Switch randomized delay calculation from
delaybeing treated as expected value for randomized_delay todelaybeing treated as maximum value forrandomized_delay(i.e. the new expected valud forrandomized_delayisdelay / 2). See theexponential_sleep_generator()method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth ofdelayvalue (since it is a subject of exponential growth, thedelayvalue was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (inf * number = inf) binstead of simply overflowing to a (most likely) negative number). Also essentially rollback the 52f12af change, since that is inconsistent with the other languages and damages uniform distibution of retry delays artificially shifting their concentration towards the end of timeout.Fix url construction in
OperationsRestTransport. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO).Last but not least: change the default values for Polling logic to be the following:
initial=1.0(same as before),maximum=20.0(was60),multiplier=1.5(was2.0),timeout=900(was120, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now beingdelay / 2) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user.*The design doc summarising all the changes and reasons for them is in progress.
In addition to the timeout/retry fixes, this PR has some other non-related technical fixes:
Python 3.10as the default python version used in CI.